Good Answers from Bad Data : a Data Management Strategy

نویسنده

  • Michael D. Siegel
چکیده

Data error is an obstacle to effective application, integration, and sharing of data. Error reduction, although desirable, is not always necessary or feasible. Error measurement is a natural alternative. In this paper, we outline an error propagation calculus which models the propagation of an error representation through queries. A closed set of three error types is defined: attribute value inaccuracy (and nulls), object mismembership in a class, and class incompleteness. Error measures are probability distributions over these error types. Given measures of error in query inputs, the calculus both computes and "explains" error in query outputs, so that users and administrators better understand data error. Error propagation is non-trivial as error may be amplified or diminished through query partitions and aggregations. As a theoretical foundation, this work suggests managing error in practice by instituting measurement of persistent tables and extending database output to include a quantitative error term, akin to the confidence interval of a statistical estimate. Two theorems assert the completeness of our error representation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ECNU: Using Multiple Sources of CQA-based Information for Answers Selection and YES/NO Response Inference

This paper reports our submissions to community question answering task in SemEval2015, which consists of two subtasks: (1) predict the quality of answers to given question as good, bad, or potentially relevant and (2) identify yes, no or unsure response to a given YES/NO question based on the good answers identified by subtask 1. For both subtasks, we adopted supervised classification method a...

متن کامل

Medical Errors Disclosure: Is It Good or Bad?

Background: In the treatment and health process, there are a lot of dangers to patients, and the increased number of medical errors is one of the most important circumstances of this process. Objective: The present research purposed to decrease medical errors through disclosure of them in hospitals of Tehran University of Medical Sciences. M...

متن کامل

An ANP and MULTIMOORA-Based SWOT Analysis for Strategy Formulation

Since no organization can have unlimited resources, strategists should decide on a strategy that can provide the greatest benefits to the organization. Decisions on strategy formulation commit the organization to produce specific products, work in specific markets, and exploit certain resources and technologies for a relatively long time. Strategies dictate the long-term competitive advantages ...

متن کامل

The pits and falls of graphical presentation

Graphics are powerful tools to communicate research results and to gain information from data. However, researchers should be careful when deciding which data to plot and the type of graphic to use, as well as other details. The consequence of bad decisions in these features varies from making research results unclear to distortions of these results, through the creation of "chartjunk" with use...

متن کامل

Parental self-support:A study of parents\' confront strategy when giving birth to premature infants

  Background :This study aimed to understand the confront strategies of parents of premature infants hospitalized in NICU.   Methods : This study was performed using qualitative content analysis approach. Twelve participants including nine parents whose infants were hospitalized in NICU, two nurses and one physician, all selected by purposive sampling method were interviewed by a female expert ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995